A Mask-Guided Transformer Network with Topic Token for Remote Sensing Image Captioning

نویسندگان

چکیده

Remote sensing image captioning aims to describe the content of images using natural language. In contrast with images, scale, distribution, and number objects generally vary in remote making it hard capture global semantic information relationships between at different scales. this paper, order improve accuracy diversity captioning, a mask-guided Transformer network topic token is proposed. Multi-head attention introduced extract features objects. On basis, added into encoder, which represents scene serves as prior decoder help us focus better on information. Moreover, new Mask-Cross-Entropy strategy designed generated captions, randomly replaces some input words special word (named [Mask]) training stage, aim enhancing model’s learning ability forcing exploration uncommon relations. Experiments three data sets show that proposed method can generate captions high diversity, experimental results illustrate outperform state-of-the-art models. Furthermore, CIDEr score RSICD set increased from 275.49 298.39.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Remote Sensing Image Dehazing using Guided Filter

Remote sensing image dehazing is challenging because it is massively ill-posed and the haze is dependent on the unknown depth information. Haze removal based on dark channel prior is effective, and refining the transmission map with Gaussian filter will produce a god result. But need to improve the naturalness and sharpness and effectiveness of images and to remove fine haze . So Proposes new a...

متن کامل

Text-Guided Attention Model for Image Captioning

Visual attention plays an important role to understand images and demonstrates its effectiveness in generating natural language descriptions of images. On the other hand, recent studies show that language associated with an image can steer visual attention in the scene during our cognitive process. Inspired by this, we introduce a text-guided attention model for image captioning, which learns t...

متن کامل

Context guided belief propagation for remote sensing image classification.

We propose a context guided belief propagation (BP) algorithm to perform high spatial resolution multispectral imagery (HSRMI) classification efficiently utilizing superpixel representation. One important characteristic of HSRMI is that different land cover objects possess a similar spectral property. This property is exploited to speed up the standard BP (SBP) in the classification process. Sp...

متن کامل

Guided Open Vocabulary Image Captioning with Constrained Beam Search

Existing image captioning models do not generalize well to out-of-domain images containing novel scenes or objects. This limitation severely hinders the use of these models in real world applications dealing with images in the wild. We address this problem using a flexible approach that enables existing deep captioning architectures to take advantage of image taggers at test time, without re-tr...

متن کامل

Remote Sensing Image Analysis via a Texture Classification Neural Network

In this work we apply a texture classification network to remote sensing image analysis. The goal is to extract the characteristics of the area depicted in the input image, thus achieving a segmented map of the region. We have recently proposed a combined neural network and rule-based framework for texture recognition. The framework uses unsupervised and supervised learning, and provides probab...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Remote Sensing

سال: 2022

ISSN: ['2315-4632', '2315-4675']

DOI: https://doi.org/10.3390/rs14122939